Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation
نویسندگان
چکیده
In phrase-based statistical machine translation, the phrase-table requires a large amount of memory. We will present an efficient representation with two key properties: on-demand loading and a prefix tree structure for the source phrases. We will show that this representation scales well to large data tasks and that we are able to store hundreds of millions of phrase pairs in the phrase-table. For the large Chinese– English NIST task, the memory requirements of the phrase-table are reduced to less than 20MB using the new representation with no loss in translation quality and speed. Additionally, the new representation is not limited to a specific test set, which is important for online or real-time machine translation. One problem in speech translation is the matching of phrases in the input word graph and the phrase-table. We will describe a novel algorithm that effectively solves this combinatorial problem exploiting the prefix tree data structure of the phrase-table. This algorithm enables the use of significantly larger input word graphs in a more efficient way resulting in improved translation quality.
منابع مشابه
Combining Multi-Engine Machine Translation and Online Learning through Dynamic Phrase Tables
Extending phrase-based Statistical Machine Translation systems with a second, dynamic phrase table has been done for multiple purposes. Promising results have been reported for hybrid or multi-engine machine translation, i.e.\ building a phrase table from the knowledge of external MT systems, and for online learning. We argue that, in prior research, dynamic phrase tables are not scored optimal...
متن کاملHierarchical Phrase-Based MT for Phonetic Representation-Based Speech Translation
The paper presents a novel technique for speech translation using hierarchical phrasedbased statistical machine translation (HPBSMT). The system is based on translation of speech from phone sequences as opposed to conventional approach of speech translation from word sequences. The technique facilitates speech translation by allowing a machine translation (MT) system to access to phonetic infor...
متن کاملDynamic Models in Moses for Online Adaptation
Avery hot issue for research and industry is how to effectively integratemachine translation (MT)within computer assisted translation (CAT) software. This paper focuses on this issue, and more generally how to dynamically adapt phrase-based statistical machine translation (SMT) by exploiting external knowledge, like the post-editions from professional translators. We present an enhancement of t...
متن کاملPhrase Based Language Model For Statistical Machine Translation
We consider phrase based Language Models (LM), which generalize the commonly used word level models. Similar concept on phrase based LMs appears in speech recognition, which is rather specialized and thus less suitable for machine translation (MT). In contrast to the dependency LM, we first introduce the exhaustive phrase-based LMs tailored for MT use. Preliminary experimental results show that...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007